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Abstract 



In this paper, we consider formal series associated with events, pro- 
files derived from events, and statistical models that make predictions 
about events. We prove theorems about realizations for these formal 
series using the language and tools of Hopf algebras. 

Keywords: realizations, formal series, learning sets, data mining, 
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1 Introduction 



Many data mining problems can be formulated in terms of events, profiles, 
models and predictions. As an example, consider the problem of predict- 
ing credit card fraud. In this application, there is a sequence of credit card 
transactions (called the learning set), each of which is associated with a credit 
card account and some of which have been labeled as fraudulent. The goal 
is to use the learning set to build a statistical model that predicts the likeli- 
hood that a credit card transaction is associated with a fraudulent account. 
Information about each credit card transaction is aggregated to produce a 
statistical profile (or state vector) about each credit card account. The profile 
consists of features. Applying the model to the profile produces a prediction 
about whether the account is likely to be fraudulent. Note that we can think 
of this example as a map from inputs (events) to outputs (predictions about 
whether the associated account is fraudulent). Given such an input-output 
map, we can ask whether there is a "realization" in which there is a state 
space of profiles (corresponding to accounts) in which each event updates the 
corresponding profile. We will see how to make this precise below. 

Usually several different fraud models are developed and compared to one 
and another. Each fraud model is associated with a misclassification rate, 
which is the percent of fraudulent accounts that remain undetected. For 
many data mining applications, especially large-scale applications, we do not 
have a single learning set, but rather a collection of learning sets. 

In this paper, we abstract this problem and use the language and tools of 
Hopf algebras to study it. To continue the example above, we abstract credit 
card transactions as events; state information about credit card accounts 
as profiles; credit card account numbers as profile IDs or PIDs; statistical 
models predicting the likelihood that a credit card account is fraudulent as 
models; a sequence of credit card transactions each of which is labeled either 
valid or fraudulent as learning sets of labeled events; and the accuracy rate 
of the credit card fraud model as the classification rate of the model. 

We are interested in the following set up. Consider a collection C, possibly 
infinite, of labeled learning sets w of events. For a labeled learning set w, we 
can build a model. Each model has a classification rate p w . This information 
can be summarized in a formal series 
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In this paper, we prove some theorems about these formal series using the 
language of Hopf algebras. 

We now give the precise definitions we need. A labeled event is an event, 
together with a Profile Identifier (PID) and a label. Fix a set D of labeled 
events. We define a labeled learning set of events to be an element of W(D), 
the set of words d± ■ • ■ of elements di G D. If A; is a field, then H = kW(D) 
is a /c-algebra with basis W(-D). In this paper we study formal series of the 
form 

p= Yl pwW - 

weW(D) 

By a formal series, we mean a map 

H — > k, 

associating to each element w G W(D) the series coefficient p w . The coef- 
ficient p w is the classification (or misclassification) rate for the learning set 
of the events in w. Formal series occur in the formal theory of languages, 
automata theory, control theory, and a variety of other areas. 

There is a more concrete realization of a model that we now describe. 
This requires a space X whose points x G X we interpret as profiles or 
states, which abstract the features used in a model. We can now define a 
model as a function from a space X of profiles that assigns a label (in k) to 
each element x G X: 

f:X^k. 

Notice that given an initial profile x G X associated with a PID, a sequence 
of events associated with a single PID will sweep out an orbit in X since 
each event will update the current profile in X associated with the PID. In 
the paper, we usually call the space X the state space and the initial profile 
the initial state. 

Fix a formal series p. We investigate a standard question: given a formal 
series p built from the events D, is there a state space X, a (classification) 
model 

f:X—,k, 

and a set of initial states that yield p. This is called a realization theorem. 
The state space captures the "essential" information in the data which is 
implicit in the series p. The formal definition is given below. 

Realization theorems use a finiteness condition to imply the infinite object 
can be represented by a finite state space. One of the most familiar realization 
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theorems is the Myhill-Nerode theorem. In this case, the infinite object 
is a formal series of words forming a language; the finiteness condition is 
the finiteness of a right invariant equivalence relation, and the state space 
is a finite automaton. In the case of data mining, the infinite object is a 
formal series of learning sets comprising a series of experiments, the finiteness 
condition is described by the finite dimensionality of a span of vectors, and 
the state space is M. n . 

The Myhill-Nerode theorem and more generally languages, formal series, 
automata, and finiteness conditions play a fundamental role in computer 
science. Our goal is to introduce analogous structures into data mining. 

We now briefly recall the Myhill-Nerode theorem following [U page 65]. 
Let the set D be an alphabet, W(D) be the set of words in D, and L C W(-D) 
be a language. A language L defines an equivalence relation ~ as follows: for 
u, v G W(-D), u ~ v if and only if for all w G W(D) either both or neither of 
uw and vw are in L. An equivalence relation ~ is called right invariant with 
respect to concatenation in case u ~ v implies uw ~ vw for all w G W(D). 

Theorem 1.1 (Myhill-Nerode) The following are equivalent: 

1. L is the union of a finite number of equivalence classes generated by a 
right invariant equivalence relation. 

2. The language L C W(D) is accepted by a finite automaton. 

We point out that in this language L C W(D) naturally defines a 

formal series. Fix a field k and the fc-algebra H = kW(D). Given a language 
L, define the formal series p as follows: 



Section 2 contains preliminary material. Section 3 constructs a finite state 
space X for the simple case of a formal series without profile identifiers or 
labels. Section 4 proves a theorem about parametrized classifiers and near 
to best realizations. Section 5 contains our main realization theorem. 

One of the goals of this paper is to provide an algebraic foundation for 
some of the formal aspects of data mining. Other (non-algebraic) approaches 
can be found in [6], [7] and []]. 

A short annoucement of the some of the results in this paper (without 
proofs) appeared in [3]. 




1 if w G L 
otherwise 
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2 Preliminaries 



Let D denote an event space. More precisely an element of D is a triple 
whose first element is a Profile IDentifier (PID) chosen from a finite set X, 
whose second element is a label chosen from a finite set of labels C, and 
whose third element is an element of S, a set of events associated with PIDs. 
In short, D = X x C x S , where X is the set of PIDs and C is the set of labels. 

We use heavily the facts that X and £ are finite sets. 

We assume that S is a semigroup with unit 1 generated by So C S. For 
example, S might be a set of transactions and S might be sequences of 
transactions. Multiplication in S might be concatenation, or some operation 
related to the structure of the data represented by 5*. 

A labeled learning set is an element of W(D), the set of words w = d±- ■ -dk 
of events in D. 

A labeled learning sequence is a sequence {w\, W2, ■ ■ ■} of labeled learning 
sets; a corresponding formal labeled learning series is a formal series 



Let H = kW(D) denote the vector space with basis W(-D), and kS denote 
the vector space with basis S. Then H is an algebra whose multiplication is 
induced by the semigroup structure of W(-D), which is simply concatenation, 
and U = kS is an algebra whose structure is induced by the semigroup 
structure of S. 

Let H denote the space of formal labeled learning series. For (i, £) 6 
X x £ define the map 7rug\ : H — > U* as follows: first, define n^ it ^(p)(s) = 
p((i, £, s)) for p G D and s 6 S; then, extend to W(-D) multiplicatively; 

We have that U = kS is a bialgebra, with coproduct given by A(s) = 
1 Cg) s + s (g) 1 for s G 5*0, and with augmentation e defined by e(l) = 1, 
e(s) = for all non-identity elements s G S. We will view S as acting 
on a state space. Since U is primitively generated, U = U(P(U)) (recall 
that P(U) — {x G U I A(x) = l®x + x®l}isa Lie algebra, and that 
U(L) is the universal enveloping algebra of the Lie algebra L [5]). We put a 
bialgebra structure on H by letting A((i,£,s)) = s (i)) ® (i, S(2)) 

where A(s) = ® £(2), and e((i,£, s)) = e(s), for i E I, £ E C, s E S , 

and extending multiplicatively to W(-D) 

A simple formal learning series is an element p E U*. We can think of a 
simple learning series p as an infinite series Yl S £S CsS - Essentially, a simple 




U' 
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formal learning series is a formal labeled learning series, but without the 
labels and PIDs. 

3 Construction of the state space 

We are concerned whether p 6 [/*, or some finite set {p a } C U*, arises from 
a finite dimensional state space X. The reason we work with a finite set of 
elements of U* rather than with a single one is that this allows us to deal 
with individual profiles that get aggregated into the full dataset. 
Since U is primitively generated, we know that U = U(P(U)). 

Remark 3.1 If if is any bialgebra, we have a left if -module action of H on 
H* defined by h^p(k) = p(kh) for p G H*, h, k G if, and a right if -module 
action of if on if* defined by p ^— h(k) = p(hk) for p G if*, h, k G if. 

The following definition is from [2]. 

Definition 3.2 We say that the simple formal learning series p G U* has 
finite Lie rank if dim P(U) ^p is finite. 

Let R be a commutative algebra with augmentation e, and let f G R. We 
say that p EU* is differentially produced by the pair (R, f ) if 

1. there is right U -module algebra structure ■ on R; 

2. p{u) = e(f ■ u) for u G U . 

A basic theorem on the existence of a state space is the following, which 
is a generalization of Theorem 1.1 in [2]. In this theorem, the state space is 
a vector space with basis {x±, . . . , x n }. 

Theorem 3.3 Let pi, . . . , p r G U*. Then the following are equivalent: 

1. pk has finite Lie rank for k = 1, . . . , r; 

2. there is an augmented algebra R for which dim (Ker e) / (Ker e) 2 is finite, 
and for all k, there is fk G R such that pk is differentially produced by 
the pair (R, f k ); 

3. there is a subalgebra R ofU* which is isomorphic to k[[xi, . . . ,x n ]], the 
algebra of formal power series in n variables, and for all k, there is 
fk G R such that pk is differentially produced by the pair (R, fk)- 
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PROOF: We first prove that part ([T]) of Theorem 13.31 implies part ([3D- Given 
Pi, . . . , p r G U*, we define three basic objects: 



L = { u G P{U) \ u^p k = 0, for all k, } 
J = UL 

= {qeU*\ q(j) = for all j G J}. 

Since L C P(U), it follows that J is a coideal, that is, that A (J) C J <g> U + 
U®J. Therefore J x = (U/J)* is a subalgebra of U*. We will show that J x 
is isomorphic to a formal power series algebra. 

Lemma 3.4 If dim^ fe P{U) ^Pk = n, then J 1 - is a subalgebra ofU* satis- 
fying 

J ± ^k[[x 1 ,...,x n ]]. 
PROOF: Note that L is the kernel of the map 

k k 

and L has finite codimension n. Choose a basis {ei,e2, . . .} of P(U) such 
that {e n+ i, e n+2 , . . .} is a basis of L. Note that if e; is the image of e, under 
the quotient map P(?7) — > P(U)/L, then {e 1; . . . , e n } is a basis for P(U)/L. 
By the Poincare-Birkhoff-Witt Theorem, U has a basis of the form 

{ e ^ n " ' " ^ Mi < ' • ' < ik and < a ir }. 

Since the basis {ei} of P(?7) has been chosen so that e, G L for z > n, it 
follows that the monomials { e" 1 ■ • ■ e" n | > } are a basis for a vector 
space complement to J. It follows that 

{e? 1 | «!,...,«„> 0} 

is a basis for £//«/. It now follows that the elements 

™a ™«1 . . . T «n 

_ j_ _ X l X n 
X rv . — 



a! «i! • ■ ■ a r , 
are in J 1 C {/*, where Xj G £7* is defined by 



. i 1 1 i . j -L li e • • • • e,- — e^, 

( 6.- • • ■ 6„- ' — * 



'tv^n ; 1 otherwise. 
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The subalgebra J consists precisely of the closure in U* of the span of these 
elements. In other words, 

J 1 = k[[ Xl ,...,x n ]], 

completing the proof. 

We will use the following facts from the proof of Lemma 13.41 suppose 
that {ei, . . . , e n , . . .} is a basis for P{U) such that {e„+i, . . . } is a basis for 
L. Let {e a } be the corresponding Poincare-Birkhoff-Witt basis. Denote J 
by R. Then R = k[[xi, . . . , x n }}, and x" 1 ■ • •x^ l /a.\\ ■ ■ -a n \ is the element of 
the dual (topological) basis of U* to the Poincare-Birkhoff-Witt basis {e a } 
of U, corresponding to the basis element e" 1 • • • e" n . 

We now collect some properties of the ring of formal power series R which 
will be necessary for the proof of Theorem 13. 31 

Lemma 3.5 Assume p £ U* has finite Lie rank, and let R C U* , e a £ U , 
and x a £ R be as in Lemma \3.4\ Define 

f = Yl CaX ° E R > 

a=(ai,...,a n ) 

where c a = ' . Then 

1. U measures R to itself via 

2. p(u) = e(f u) for all u £ U . 

Proof: We begin with the proof of part ([]]). Since U measures U* to itself 
and R C U*, we need show only that R U C R. Take r £ R, u £ U 
and j £ J. We have (r u)(j) = r(uj). Since J is a left ideal, uj £ J, so 
r{uj) = 0, so r u £ J 1 = R. This proves part (0Q). 

We now prove part ([2]). Let e a = e^ 1 •••e i k be a Poincare-Birkhoff- 
Witt basis element of U. Since e a £ J unless {ii,...,ik} C {l,...,n}, 
p(e Q ) = unless {h,...,i k } Q {1, . . . , n}. Also e(/ ^ e a ) = / ^ e Q (l) = 
/(e a l) = f(e a ) = unless {ii,...,ijfc} Q {l,...,n}. Now suppose {i^ 
■ ■ , 4-} ^ {!>••• , w}. We have in this case that p(e a ) = a\c a = f(e a ) = 
f e a (l) = e(f e a ). Since {e a } is a basis for U, this completes the proof 
of part (j2J) of the lemma. 



9 



Corollary 3.6 Under the assumptions of Lemma \3.5[ f = p. 



Lemmas 13.41 and 13.51 yield that part ([T]) implies part ([3]) in Theorem 13.31 
It is immediate that part ([3]) implies part (j2J). 

We now complete the proof of Theorem 13.31 by proving that part (J2]) 
implies part ([TJ. 

Let xi,...,x n G Kere be chosen so that {xi,...,x n } is a basis for 
(Kere)/(Kere) 2 . If / G R and u eU, then 

it 

f -u = q (u)l + ^ 1i( u ) x i + 9{u), 
1=1 

where g« G U* and gr(u) G (Kere) 2 . Let £ G P{U). Since £7 measures i? to 
itself and A(£) = 1 ® £ + £ <g) 1, the map / i— > / • £ is a derivation of i?. 
Now let /fe G be the element such that 

p fc (u) = e(f k ■ u). 

Then 

fk-ul = (f k -u)-i 

n 

= q kfi (u)l -1 + ^2 qk,j(u)xj ■ I + 9k(u) ■ I. 

3=1 

Since the map is a derivation, 1 • £ = 0, and since g a {u) G (Kere) 2 , 

g a (u) ■ t G Kere. It follows that 

£^p k (u) = p k (u£) 

= e(f k -u£) 

n 
3=1 

Therefore P(U) ^ p k C J~Jj=i so Pk has finite Lie rank. This completes 
the proof of Theorem 13.31 

Definition 3.7 A series p G H for which the set 

{P(i,i) = K{i,e){p) I t e C, i El} 
satisfies the conditions of Theorem \3.S\ is called regular. 
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We have shown how to construct a state space X and a right [/-module 
algebra R of observations for a regular series. 

Although the R we have constructed is a power series algebra, for appli- 
cations we will often use some other right [/-module algebra of functions on 
X. We will assume that we have an action of S on X which induces the 
action of U on R, that is, that R is a [/-module algebra. 



4 Learning sets of profiles and realizations 

Let X be a finite set of PIDs, and let £ be a finite set of labels. Let H and U 
be the bialgebras described in Section [2J X be the corresponding state space 
as described in Section [31 and R be a right [/-module algebra of functions 
from X to k. 

Definition 4.1 A classifier is a function f : X — > C. A learning set of 
profiles is a function \ : X — > C x X , that is, a finite set {(lj, Xj)}, where 
£j G C, Xj G X , and j G X. 

Note that a classifier is a model as defined in Section 1. We denote the set 
of classifiers by T and the set of learning sets of profiles by C. W(D) acts on 
C as follows. If d = (i, l,s) G D and x — {(lj, x j)}i define X'd = Xj) ■ d}, 
where 

(£ ) d — i ^' Xj ' ^ * = ^ 
J 1 (lj,Xj) otherwise. 

That is, the event d = (i, £, s) acts on the learning set of profiles x — {(lj, x j)} 
by acting on the individual points (£j follows: if j i the point is 

unchanged; if j = i the point Xj is moved to Xj ■ s and the label is changed 
to £. 

A pairing <C/, x^ between classifiers and learning sets of profiles can be 
given as follows. Let / : X — ► £ be a classifier, and x — {(£i, x i)} be a 
learning set of profiles. Then 

Note that < <^f,x^ — 1- This pairing is a measure of how well the 
classifier / predicts the actual data represented by x- 
We define the notion of realization as follows. 
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Definition 4.2 Let 

f:X^C 

be a classifier, let 

be a learning set of profiles, and let <C— , — ^> be a pairing. We say that the 
triple (X, f, x) is a realization of the series p G H if 

Ph = </,x- fc§>- 

Note that the classifier <C/, x ' defined in Equation (ED) is bounded, 
in fact 

0<</,X-/i»<l. 

Recall that p = Ylhew(D) Phh is the formal series of which we are studying 
realizations. 

Lemma 4.3 Fix a finite learning set x> an d fix A <0 M. n . Suppose that there 
is a map M : A — > T such that CM(o)j • h^> is a bounded function of 
a G A, and p G H for which there is a state space X and a ring of functions 
R as described in section^ Assume that ph, h G W(D), is bounded. Let 

M(a) = sup \p h -<M(a), X-/i>|, 

heW(D) 

Then for all e > there exists ao G A such that \M(ao) — inf ag ^M(a)| < e. 

Note that the hypothesis on M includes models which are polynomials, tree 
classifiers, neural nets, and splines. 

Proof: 

Since everything in its definition is bounded, M(a) exists and is bounded. 
If P : A — > M. is any bounded function, then there is ao G A such that -P(cto) 
is within e of inf aey i P(a). 

Note that for any realization M(a) of p, we have that 

M(a) = sup \p h - <M (a), x • h>\ 
hevv(D) 

measures how well M(a) realizes p. so that inf ag ^ M(a) is the lower bound 
for the "goodness" of any realization. The lemma says that this lower bound 
can be approximated arbitrarily closely. 
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Theorem 4.4 Let p : k\V(D) — > k be such that p^ is bounded, and let 
M : A — > J 7 be a parametrized classifier such that <M(a), is a bounded 
function of a. Then for all e > there is a realization po = M(ao) of p such 
that the "goodness" of the realization afforded by po is within e of the lower 
bound, that is, \M(ao) — inf aeJ 4M(a)| < e. 

Proof: 

Theorem 14.41 follows immediately from Corollary 14.31 

5 Parametrized realizations 

In this section we consider an event space D, a realizable labeled learning 
series p, a state space X, and an algebra of functions R from the state space 
X to k. 

Denote by C learning sets of profiles and denote by T the set of functions 
from X to the finite set of labels C. Fix a vector space of parameters A, and 
a map 

M : A — > T 

giving a parametrized family of models. 

In this section we study parametrized realizations of formal series p G H 
of learning sets. 

Compare Definition 15. II to Definition 14.21 in which realizations are defined. 

Definition 5.1 A parametrized realization of a bounded function p G H is: 

1. A vector space of parameters A. 

2. A parametrized family of models M : A — > T . 

If A is a finite dimensional vector space, we say that the realization is A- 
finite. 

Theorem 15.21 below gives a finiteness condition on the action of A on 
p G H which gives an A-finite realization. 

For / = M{a) G S and i G £, let f e be defined by 

= f I if f{x) = I, 
[ * otherwise, 
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where * is unequal to any label t G C Let pi{h) be denned by 

Pe(h) = <//,x-/i». 

Theorem 5.2 Letp G if* 6e a formal sum of learning sets and M : A — > T 
a family of labeled models parametrized by A. Assume: 

1. there exists f G ImM snc/i £/ia£ 

p(h) = </,x-/i>, 

2. { /3 G A* | /3 ^pi = } is a subspace of A* of finite codimension which 
is closed in the compact open topology for all £ G C 

Then there exists an A-finite realization of p. 

Note that Theorem 14.41 gives the existence of a realization which approx- 
imates the desired one. 

PROOF: We define three basic objects: 

L e = {(3eA* \/3^p t = 0} 
J e = k[A*}L e 

J/ = { q e k[A*}* | q(j) = for all j G J t }. 

We have that Ji is a coideal in the Hopf algebra k\A*\ generated by primitive 
elements in A*, that is, that A(J e ) C J e <g> k[A*] + k[A*} ® J £ . Therefore J/- = 
(&L4*]/J^)* is a subalgebra of fcLA*]*. We will show that Jjh is isomorphic to 
a formal power series algebra in finitely many variables. 

From hypothesis (T5]) we have that L\ = (A* / 'Lg)* is finite dimensional 
subspace of A. 

Lemma 5.3 If dim L\ = ne, then is a subalgebra of k[A*]* satisfying 
where {oi, . . . , a ne } is a basis for L\ . 
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PROOF: The subspace Lg is a closed subspace of A* of finite codimension n, 
so that (A* /Li)* = L\ is a finite dimensional subspace of A. Let {A* /Lg)* 
have basis {ax, . . . , a ne }. Choose (3 ai G A* with (3 ai { a j) — Now choose a 
basis B of A* such that # D {/3 Ql , and B' = B\ {p ai ,p a2 , . . . , /3 a „ £ } 

is a basis of Lg. We have that k[A*] has a basis 

{ C 1 ■ • I /3fe G i3, ii < • • • < 4, and < a ir }. 

By the choice of the basis of A*, Jg will have a basis of the form 

/Q a h Q ai k 
Hh ' ' ' Hi k 

with at least one G £>'. It follows that 

{/^•••/C/ |ai,...,S>0}. 

where we denote by /3 afc the image of that element in k[A*]/ Jg, is a basis for 
k[A*]/Jg. It now follows that elements of the form 

a = a ■ 1 ■ • ■ a, * 

are in Jf- C £7*. and that consists precisely of the closure in &;[/!*]* of the 
span of such elements. In other words, 

J i ~ k[[a u . . .,a ne ]], 

completing the proof. 

By Lemma [5\3l each pg depends on a finite dimensional space of parameters 
Aigy Let A be the finite dimensional subspace which is the spanned by the 
union of these finite dimensional subspaces. Since p(h) = ^2g G cPi(h), p(h) 
depends only on parameters in A . 

Now <C/, x " h^$> depends only on parameters in A . We may choose 
the other parameters which are linearly independent from A arbitrarily. In 
other words we may choose fo so that it depends only on the parameters in 
A . 

This completes the proof of Theorem 15.21 
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